Firefly Algorithm with Mini Batch K-Means Entropy Measure for Clustering Heterogeneous Categorical Timber Data

نویسندگان

چکیده

Clustering analysis is the process of identifying similar patterns in various types data. Heterogeneous categorical data consists on ordinal, nominal, binary, and Likert scales. The clustering solution for heterogeneous remains difficult due to partitioning complex dissimilarity features. It necessary find a high-quality techniques efficiently determine significant features This paper emphasizes using firefly algorithm reduce distance gap between improve performance. To obtain an optimal global clustering, we proposed hybrid mini-batch k-means (MBK) clustering-based entropy measures (EM) with optimization (FA). study compares performance K-Means, Agglomerative, DBSCAN, Affinity models EM FA. evaluation uses variety from timber perception survey dataset. In terms performance, MBK+EM+FA has superior most effective clustering. achieves higher accuracy 96.3 percent, 97 percent F-measure, 98 precision, recall. Other external assessments revealed that Homogeneity (HOMO) 79.14 Fowlkes-Mallows Index (FMI) 93.07 Completeness (COMP) 78.04 V-Measure (VM) 78.58 percent. Both MBK+EM took about 0.45s 0.35s compute, respectively. excellent quality results does not justify such time constraints. Surprisingly, model reduced measure all future could put different domain test.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

An Improved K-means Algorithm for Clustering Categorical Data

Most of the earlier work on clustering is mainly focused on numerical data the inherent geometric properties of which can be exploited to naturally define distance functions between the data points. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the...

متن کامل

Nested Mini-Batch K-Means

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically re...

متن کامل

Turbocharging Mini-Batch K-Means

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Advanced Computer Science and Applications

سال: 2022

ISSN: ['2158-107X', '2156-5570']

DOI: https://doi.org/10.14569/ijacsa.2022.0130756